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the papers both individually and collectively. Their importance is discussed in terms of each paper’s 
contribution to the general research on this topic and each paper’s potential to inform educational 
policy. In addition, the papers reflect our shared thinking about VAMs, VAM output, and the 
inference-based decisions for which VAMs are increasingly being used. 

Keywords: Value-added; teacher evaluation; teacher accountability. 

La investigacion de los Modelos de Valor Agregado (MVA) para las pollticas educativas: 
Delimitando la discusion. 

Resumen: En este articulo, los editores invitados de este numero especial de EPAA/AAPE sobre la 
investigacion de los Modelos de Valor Agregado (MVA) para las pollticas educativas: (1) presentan 
los antecedentes y el contexto politico que rodean la utilization de MVA en la evaluation de los 
docentes y rendition de cuentas en los Estados Unidos, (2) un resumen de los cinco trabajos de 
investigacion y el comentarios que fueron seleccionados para su inclusion en este numero especial, y 
(3) analizamos la pertinencia de los documentos, tanto individual como colectivamente. Su 
importancia se discuten en terminos de la contribution de cada documento para la investigacion 
general sobre este tema y el potential de cada articulo para informar la politica educativa. Ademas, 
los documentos reflejan nuestro pensamiento sobre la production de MVA y las decisiones basadas 
en MVA son cada vez mas utilizados. 

Palabras clave: modelos de valor agregado; evaluation docente; responsabilidad docente. 

Pesquisa dos Modelos de Valor Adicionado (MVA) para as pollticas de educa§ao: 
Delimita§ao da discussao. 

Resumo: Neste artigo, os editores convidados desta eclicao especial da EPAA/AAPE sobre a 
Pesquisa dos Modelos de Valor Adicionado (MVA) para as pollticas de educafao: (1) Apresentam o 
contexto politico em torno do uso dos MVA na avalia^ao de professores e de responsabiliza^ao nos 
Estados Unidos, (2) um resumo dos cinco trabalhos de pesquisa e o comentario que foram 
selecionados para inclusao nesta edi^ao especial, e (3) analisar a relevancia dos trabalhos 
selecionados tanto individualmente como coletivamente. Sua importancia e discutida em termos da 
contribui^ao de cada artigo para a pesquisa geral sobre este tema e as potencialidades de cada artigo 
para informar a politica educacional. Alem disso, os artigos refletem o nosso pensamento sobre a 
produfao de decisoes e MVA, e como sao cada vez mais utilizados. 

Palavras-chave: valor agregado, modelos de avaliacao de professores, de responsabilidade do 
professor. 

Introduction 

Historically, throughout the United States, public education agencies have used localized 
approaches for evaluating teachers and making determinations about teacher effectiveness. With few 
exceptions (e.g., the state of Tennessee), teacher evaluation efforts have been traditionally governed 
and developed by school districts under the guises of district control. Accordingly, school districts 
have not been encouraged or incentivized to conform to any particular teacher accountability 
frameworks. Now, however, evaluating teachers using value-added models (VAMs) 1 has become a 
matter of federal and state education policy, as well as a matter of federal and state educational 


1 While Student Growth Models (SGMs) are being used more often than Value-Added Models (VAMs) at the state-level 
(Collins & Amrein-Beardsley, under review), VAM is the more popularly used term. As such, we refer to VAMs 
throughout this manuscript in their most general form, acknowledging the fact that there are distinct differences among 
and between specific VAMs and SGMs, as well as their model specifications, the methodologies and statistics used, and 
the assumptions upon which they are based. 
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urgency and resolve (Corcoran, 2010; Stumbo & McWalters, 2011; U. S. Department of Education, 
2009a). 

Encouraged by over $350 million in federal funds through President Obama’s Race to the Top 
(RttT) competition, states are exploring methods to capture the value a teacher adds to student 
learning from one year to the next (i.e., a teacher’s value-added). To date, 18 states, the District of 
Columbia (D.C.), and 16 school districts across the country have won RttT funding to support these 
efforts (U. S. Department of Education [USDOE], 2012a, 2012b). As a result, education agencies 
are increasingly developing teacher accountability systems based in large part on measures of 
academic growth that can be attributed to teachers’ effectiveness (USDOE, 2009b). Additionally, 44 
states and D.C. have applied for No Child Left Behind (NCLB) waivers (Philips, 2012), excusing 
them from NCLB’s prior goal that 100% of the students in their public schools would be 
academically proficient by the year 2014. In exchange for these pardons, these states have also 
agreed to adopt stronger teacher accountability mechanisms, again based in large part on the growth 
demonstrated by students as measured via VAMs. 

By definition, VAMs are designed to isolate and measure teachers’ contributions to student 
learning and achievement on large-scaled standardized tests as groups of students move from one 
grade level to the next. Statisticians measure value-added by mathematically calculating the “value” a 
teacher “adds to” or “detracts from” student achievement scores over time, and as compared to 
teachers with “similar” students. Purportedly, VAMs allow for richer analyses of achievement data 
by tracking student learning trajectories from the time they enter a classroom to the time they leave. 
In addition, VAMs have arguably improved upon the educational measurement systems previously 
used for test-based accountability (see, for example, Capitol Hill Briefing, 2011; Harris, 2011). 

As such, it makes sense that VAM output be used as an integral component of 
contemporary teacher accountability policies. To this end, states are increasingly incorporating VAM 
components within their teacher evaluation frameworks. But just because it makes sense to do this 
does not mean it works. That said, it is important to examine whether integrating VAM output as 
part of teacher accountability policies works in the ways theorized (e.g., by those securing state and 
federal contracts to conduct this work, by the federal government via initiatives such as RttT and 
NCLB waivers). Furthermore, it is critical to examine whether VAMs work well enough to make 
highly consequential decisions about teachers (i.e., publishing teacher names and their VAM scores, 
using VAM output as a significant factor in decisions such as teacher tenure, merit pay, or 
continuation of employment). 

Calls for all types of research on VAMs are urgent and pertinent especially as policy 
development continues to focus on VAMs with unwarranted levels of certainty and conviction (see, 
for example, Schafer, Lissitz, Zhu, Zhang, Hou, & Li, 2012). To this end, five manuscripts and one 
commentary are featured in this special issue of Education Policy Analysis Archives (EPAA). 
Collectively, the authors present evidence-based arguments about VAMs and their use in local, state, 
and national policy contexts. In their own unique ways, with unique methods of inquiry, the authors 
advance our thinking about the proper use and role of VAMs for educational policy. 

Special Issue Summaries 

Today, EPAA features Diana Pullin’s (Boston College) research paper on Regal Issues in the 
Use of Student Test Scores and Value-Added Models (VAM) to Determine Educational Quality. Pullin 
addresses the changing legal landscape associated with policy-based, high-stakes, teacher evaluation 
systems using VAMs. She argues that when policy-based evaluation systems explicitly require use of 
score-based measures, the data quality standards of those measures may be subjected to critical 




Education Policy Analysis Archives Vol. 21 No. 4 


SPECIAL ISSUE 


4 


examination. In addition, given the conflicting perspectives currently held by researchers regarding 
VAMs, plaintiffs in legal proceedings have begun to argue that such methods may be insufficient or 
otherwise inadequate to discriminate levels of professional quality in support of high-stakes 
employment decisions. As a result, courts may burden education agencies by requiring them to 
provide substantive evidence of reliability and validity. Technical and inferential limitations of VAM- 
based evaluation ratings may also raise legal considerations pertaining to substantive due process and 
equal protection issues, infringement on individual civil rights, and issues related to commercial 
liability (e.g., test design). 

In a related article also today, EPAA features The Legal Consequences of Mandating High Stakes 
Decisions Based on Low Quality Information: Teacher Evaluation in the Ruice-to-the-Top Era authored by Bruce 
Baker (Rutgers, The State University of New Jersey), Joseph Oluwole (Montclair State University), 
and Preston Green (The Pennsylvania State University). Contextualized in a review of district and 
state policies related to teacher evaluation and general human resource decision-making. Baker, 
Oluwole, and Green provide many reasons to question the usability of student growth and VAMs 
for such purposes. They also, like Pullin, focus on identifying potential legal ramifications and other 
consequential outcomes of these policies and practices if such considerations are not taken seriously. 

Tomorrow, EPAA will feature an in-depth analysis conducted by Nicole Kersting and Mei- 
Kuang Chen (University of Arizona) and James Stigler (University of California, Los Angeles) titled 
Value-Added Teacher Estimates as Tart of Teacher Evaluations: Exploring the Effects of Data and Model 
Specifications on the Stability of Teacher Value-Added Scores. Kersting, Chen, and Stigler investigate the 
extent to which three VAM specifications impact the overall stability of value-added output (with 
implications for validity). The three specifications they examine include methodological variations in 
(1) accounting for students’ academic and other background variables, (2) using single or multiple 
cohorts of students to measure teacher value-added, and (3) sample size specifications and their 
related standards of error, all of which ultimately impact VAM output and teacher-level 
classifications. Like others, they find issues with stability (Koedel & Betts, 2005; Schochet & Chiang, 
2010), sample sizes (Lockwood, McCaffrey, & Sass, 2008; McCaffrey, Lockwood, Koretz, & 
Hamilton, 2003; Nelson, 2011), and other data and model specifications (Harris, Sass, & Semykina, 
2012; Papay, 2011). The paper adds important evidence to support the argument that before VAM 
data can be used for consequential purposes, such issues must be addressed and taken much more 
seriously than they currently are. 

On Wednesday, EPAA will feature Ecologies of Education Quality authored by Elizabeth Graue, 
Katherine Delaney, and Anne Karch (University of Wisconsin, Madison). In this qualitative piece, 
Graue, Delaney, and Karch confront the challenging task of dissecting the influence of school and 
community contexts on outcome measures of educational quality in general. Using value-added and 
observational data together, including both teacher and school level data from four school sites in 
one metropolitan area, they contrast how these evaluative measures are operationalized within the 
contexts in which the measurements are constructed, interpreted, and used. The authors provide 
readers with an in-depth review of the complexities, noting specifically the relative inadequacies of 
the outcome measures used to represent all aspects of teacher quality. They ultimately argue that 
even with the most sophisticated technical controls, VAMs cannot adequately capture or control for 
all of the contextual variables involved. 

On Thursday, EPAA will feature Sentinels Guarding the Grail: Value-Added Measurement and the 
Quest for Education Reform authored by Rachael Gabriel (University of Connecticut) and Jessica Nina 
Lester (Washington State University). Gabriel and Lester present another qualitative analysis, this 
time analyzing a series of state-level policy discussions surrounding the use of value-added data, 
specifically derived via the Tennessee Value-Added Assessment System (TVAAS). Using discourse 
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analysis methods, the authors demonstrate how value-added is presented and received among both 
the public and policymakers as scientific, objective, accurate, and efficient. In addition, Gabriel and 
Lester highlight the numerous concerns and cautions issued by educational researchers and critics of 
VAMs (e.g., lack of reliability and validity, bias, errors, and inherent problems and overreliance on 
achievement tests), and they evidence how these issues are often overlooked and dismissed at 
multiple policy-levels. 

On Friday, EPAA will end the special issue featuring a commentary authored by Moshe 
Adler (Columbia University) titled “Findings vs. Interpretation in ‘The Long-Term Impacts of 
Teachers’ by Chetty et al.” Here, Adler critiques the acclaimed, overly publicized, and hotly 
contested study of New York City teachers’ value-added and long-term impacts that was conducted 
by Chetty, Friedman, and Rockoff (2011). Specifically, Adler critically highlights the omissions, 
fallacies, and misrepresentations made by the authors, all of which resulted in unwarranted national 
and international attention and misled policymakers and the public alike, especially in terms of the 
value and potential (yet exaggerated) powers of VAMs. Adler also makes the case that had Chetty et 
al.’s work been peer reviewed prior to its public release (see also Lowrey, 2012), it likely never would 
have had the impact it did given the serious methodological and other shortcomings highlighted 
here and elsewhere (Ballou, 2012; Ravitch, 2012; Winerip, 2012). 

In sum, the papers featured as part of the EPAA special issue on VAMs will highlight the 
ways in which education agencies across the United States are increasingly aligning their policy 
interests and dependencies on VAMs; that is, the use of VAMs to assess teacher quality and 
effectiveness as well as to hold teachers accountable. Yet while a growing volume of published 
research has revealed substantive methodological and inferential concerns when evaluating teacher 
effectiveness using VAM approaches (see, for example, Amrein-Beardsley, 2008; Corcoran, 2010; 
Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012; Muijs, 2006; McCaffrey, 2003; 
Papay, 2012; Raudenbush, 2004; Rivkin, 2007; Rothstein, 2009; Rubin, Stuart, & Zanutto, 2004; 
Scherrer, 2011; Schochet & Chiang, 2010; Zeis, Waronska, & Fuller, 2009), this dialogue is 
unfortunately taking place among academic researchers and not between academic researchers and 
policymakers. The translation of research to practice and policy development has yet to be realized 
in significant and impactful ways. Much more effort needs to be placed on sharing these results with 
policymakers (see, for example, Capitol Hill Briefing, 2011). 

Indeed, the goal of this EPAA collection is to present a series of research-based studies that 
are peer-reviewed, but that are also freely and openly accessible to all, including those at multiple 
policy levels. In addition, because EPAA facilitates providing policymakers direct access to research 
and research-based recommendations, in this case about a widely (and wildly) popular educational 
policy, hopefully this too will help to further inform our collective thinking about VAMs. 
Policymakers and others outside of academia must begin to better understand the methods, model 
specifications, and assumptions being made, as well as the appropriate interpretations and inference- 
based uses of VAM output. 


Framing the Issue 

With this in mind, we present next the major and minor themes prevalent across papers. We 
do this here particularly so that policymakers and others might be better equipped to delve further 
into the very real issues and concerns that come along with the adoption and implementation of 
VAMs. 

First, all of the papers trace their lineage back to the 2009 federal RttT initiative, which 
represented a critical transition point in public education policy. This popularized VAMs as the 
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policy tools pitched for stronger accountability purposes and purposes of radical educational reform. 
This widespread and rapid acceptance is an underlying theme across the papers presented in this 
special issue. 

Second, all papers featured in this special issue are applicable to educational policymakers, 
particularly as all of the contributing researchers are focused on the same topic in their own diverse 
ways. While they each use unique methods of inquiry, the underlying purpose is to help others better 
understand and make sense of VAMs’ intended and unintended effects. As well, the individual and 
collective papers should help others to better ascertain whether VAMs indeed work in the ways 
theorized. Unique contributions here come from the qualitative pieces offered by Graue et al. and 
Gabriel and Lester. These are unique in that it is rare that qualitative research is published on this 
topic. Also distinctive are the law-based pieces put forth by Pullin and Baker et al. These are 
distinctive because we are only just beginning to understand the legal ramifications that might come 
along with VAM use in general, but more importantly for highly consequential decision-making 
purposes. In some cases, we are already beginning to witness the impact that VAM use (and abuse) 
can have on the personal and professional lives of public school teachers (Amrein-Beardsley & 
Collins, 2012). 

Also of great importance is Kersting et al.’s scholarly contribution. Via their examination of 
the methodological issues often associated with VAM-based ratings, they add more rich evidence in 
support of similar concerns about model stability, the substantial errors that are inhibiting VAM 
practicality, and how model specifications can compromise overall levels of validity. As well, Adler 
adds to our thinking about how technical details matter, especially in a case in which VAM-based 
findings may have resulted in erroneous conclusions about the long-term instructional effects of 
teachers. Alder highlights the potential for these judgments to improperly lodge themselves within 
prevalent policy ethos, especially if research-based findings are publicized prior to being peer- 
reviewed by the researchers best equipped to judge the merits of such research-based studies. 

A third perspective positions the papers within a larger conceptual validation study. That is, 
findings from each combine to form a larger conceptual framework for assessing the suitability of 
VAMs as tools for making consequential decisions about teachers and their effectiveness. In this 
regard, the stability of VAM ratings are examined by Kersting et al., while Baker et al. question the 
wisdom of employing pre-specified weights to VAM components. In addition, Graue et al. look at 
resource allocation and cultural coherence as unmeasured construct-relevant factors impacting 
student learning, and Gabriel and Lester highlight the absence of critical reflection on the part of 
policymakers regarding VAM-based measures. Pullin advances legal consequences emanating from 
the methodological problems associated with VAMs, and Adler’s examination of Chetty et al.’s 
(2011) study reveals that threats to validity may be internal, due to the improper application and 
interpretation of methodological approaches. Collectively these papers provide a more nuanced, 
multi-faceted, view of the VAMs currently situated within the larger policy context of stronger 
accountability, and also as currently situated as America’s ideal mechanisms to promote meaningful 
educational reform. 


Conclusion 

A close read of these articles reveals the tension existing between the policy and research 
communities regarding VAM and the evaluation of teacher effectiveness and educational quality. On 
one hand, policymakers throughout the country are increasingly embedding score-based (VAM) 
approaches within educational evaluation and accountability systems. On the other hand, social 
science researchers are increasingly questioning the methodological, technical, and inferential 
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attributes of these same VAM approaches. 

Gabriel & Lester use the phrase sentinel of trust to reflect the degree to which policymakers 
have come to accept VAM as an objective, reliable, and valid measure of teacher quality. At the same 
time, they note how the same audience ignores the technical and methodological issues examined by 
some of the papers here and elsewhere. This is because of what Graue et al. call policymaker’s 
“insatiable appetite [s]” (p. 2) for quality, objective indicators in education. 

It is in this context that these research-based papers are presented to readers, individually 
and collectively, as these papers stand to “add value” to the literature regarding educational policy, 
high-stakes accountability, and teacher evaluation in general. Specifically, these papers stand to “add 
value” in terms of how policymakers, their affiliates, and others might more easily access a series of 
diverse, research-based contributions about how we might more wisely proceed in terms of thinking 
about VAMs and VAM-based use. 
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